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We provide a new efficient adaptive algorithm for performing phase estimation that does not 
require that the user infer the bits of the eigenphase in reverse order; rather it directly infers the 
phase and estimates the uncertainty in the phase directly from experimental data. Our method is 
highly flexible, recovers from failures, and can be run in the presence of substantial decoherence and 
other experimental imperfections and is as fast or faster than existing algorithms. 


I. INTRODUCTION 


performed on the circuit 


Eigenvalue estimation has been a cornerstone of 
physics since the dawn of spectroscopy. In recent years, 
ideas from quantum information have revolutionized the 
ways that we estimate these values by providing methods 
that require exponentially fewer experiments than sta¬ 
tistical sampling and total experimental time that sat¬ 
urates the Heisenberg limit, which is the best possible 
scaling allowed by quantum mechanics. This quantum 
approach, known as phase estimation (PE), is crucial to 
acheive many of the celebrated speedups promised by 
quantum computing [1-5] and as such optimizing these 
algorithms is especially important for present day exper¬ 
imental demonstrations of such algorithms because the 
number of quantum gates that can be performed is often 
severely limitted by decoherence. 

The most popular approach to PE is known as iterative 
phase estimation (IPE), as it forgoes the use of quantum 
resources to infer the eigenvalues of unitary matrix in fa¬ 
vor of using a classical inference algorithm to estimate 
the eigenvalues from experimental data [6-9]. Although 
the methodologies used for inference stretch back to the 
nineties, the ongoing revolution in machine learning that 
has lead to a plethora of improved classical inference pro¬ 
cedures. This raises an important question: can these 
ideas be used to speed up phase estimation or make it 
more robust to experimental error? 

We answer this in the affirmative by providing a new 
classical inference method, inspired by recent work on 
particle filter methods, that is tailored to phase estima¬ 
tion. Our method not only provides performance advan¬ 
tages over existing iterative methods [6, 7], but is also 
robust to experimental imperfections such as depolariz¬ 
ing noise and small systematic errors. 

Before going into detail about our algorithm we will 
first review the phase estimation circuit and Bayesian 
approaches to PE. We then show that rejection sampling 
can be used to efficiently approximate Bayesian inference, 
thereby allowing PE to inherit the speed and robustness 
of Bayesian approaches while retaining the efficiency of 
traditional methods. 

Iterative PE infers the eigenvalue of a given eigenvector 
of a unitary matrix U from a set of experiments that are 
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where U ]</>) = e 1 ^ \<p) for an unknown eigenphase </> £ R, 
and where Z(M6) = e' Mez is a rotation that can be 
used to compare the eigenphase of U to a known refer¬ 
ence value. Most PE algorithms use this circuit to infer 
the bits in a binary expansion of <f> in order from least 
significant to most significant. The process is can be 
made near optimal (up to log* factors [9]) and affords an 
efficient inference method for these bits. 


II. BAYESIAN PHASE ESTIMATION 


Bayesian phase estimation, as introduced by Svore et 
al. [9] , involves performing a set of experiments and then 
updating the prior distribution using Bayes’ rule. For 
example, if an experiment is performed with using M 
repetitions of U , Z(M9) and a measurement outcome 
E € {0,1} is observed then Bayes’ rule states that the 
posterior probability distribution for (f> after observing the 
datum is 
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The final ingredient that is needed to perform an up¬ 
date is the likelihood function P(0\(f>', 6, M). For phase 
estimation, 


p(o\4>-,e,M) = 
p(i\4>-,e,M) = 


1 + cos (M[(j) + 0]) 
2 ’ 
1 — cos (M[4> + 0]) 


( 2 ) 


After using (1) to update the posterior distribution we 
then set the prior distribution to equal the posterior dis¬ 
tribution. This process is then repeated for each of the 
random experiments in the data set. 

Unlike conventional methods, Bayesian inference re¬ 
turns a posterior distribution over the phase. The mean 
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Experiment number 

FIG. 1. Median errors in phase estimation for 10 000 random 
initial choices of the true eigenphase. 


and standard deviation of this distribution provide an es¬ 
timate of the true eigenvalue and the algorithm’s uncer¬ 
tainty in that value. More sophisticated estimates of un¬ 
certainty, such as credible regions, can also be extracted 
from the posterior distribution [10, 11]. Estimates of 
the uncertainty are crucial here because we use them to 
design highly informative experiments and also because 
they allow the protocol to be terminated when a thresh¬ 
old accuracy has been reached. 


III. APPROXIMATE BAYESIAN PHASE 
ESTIMATION 

Exact Bayesian inference is impossible if the eigen- 
phases are continuous so approximations are needed to 
make the procedure tractable. Rather than naively dis- 
critizing the prior distribution, modern methods dis¬ 
cretize by sampling from the prior and then perform 
Bayesian inference on the discrete set of samples (often 
called “particles”). These particle filter methods meth¬ 
ods are quite powerful and have become a mainstay in 
computer vision and machine learning [12-14]. 

Despite the power of these methods, they are often 
hard to implement and even harder to deploy in a mem¬ 
ory limited environment such as on an embedded con¬ 
troller or an FPGA. With the increasing use of FPGAs in 
the control of quantum information experiments [15-17], 
overcoming this limitation presents a significant advan¬ 
tage to modern experiment design. 

We propose a much simpler approach that we call Re¬ 
jection Filtering Phase Estimation (RFPE). Rather than 
using a set of hypotheses that implicitly define a model 
for the system, we posit a prior model and directly up¬ 
date it to find a model for our posterior distribution. We 
achieve this by using a Gaussian with mean fj, and vari¬ 
ance a 2 to model our initial prior, perform a Bayesian 
update on samples drawn from the distribution and then 
refit the updated samples to a Gaussian. This strategy 
is used in a number of particle filter methods such as 
the extended Kalman filter and assumed density filter¬ 



Median Applications of U 

FIG. 2. Comparison of RFPE to ITPE for t = 10 000 with 
100 samples for rptme = 2nk/t at each measurement. 

ing [12, 18]. We further optimize this process by ap¬ 
proximating the Bayesian inference step using rejection 
sampling. This can reduce the memory required by a 
factor of 1 000 or more, as we need only consider a single 
sample at a time. Our algorithm is described below and 
pseudocode is given in Appendix E. 

1. Perform experiment for given 9 , M and observe 
outcome E £ {0,1}. 

2. Draw m samples from M{4>\n, ct 2 ). 

3. For each sample, <j>j, assign <f>j to $ accept with prob¬ 
ability P(E\<j>j-,9,M)/KE, where ke £ (0,1] is a 
constant s.t. P{E\(f>j\6 , M)/ke < 1 for all 4>j,E. 

4. Return p = E(<f> accept ) and a = accept)- 

The resultant samples are equivalent to those drawn 
from the posterior distribution P(<j>\E; M , 6). To see this, 
note that the probability density of a sample being ac¬ 
cepted at <j) = (j)j is P{E\cj>\ 9, a 2 ). Eqn (1) 

then implies 

P(E\(/>-, 9, a 2 ) oc P((j)\E ; 0, M), (3) 

which implies that the accepted samples are drawn from 
the posterior distribution. 

Although it is difficult to concretely predict the value 
of m needed to make the error in the inference small, we 
show in Appendix C that m must scale at least as the in¬ 
verse square of the relative fluctuations in the likelihood 
function. Similarly, Markov’s inequality shows that m 
must scale at least as ke/ f P{E\cj>; 9, M)P(<j>)d<j> to en¬ 
sure that, with high probability, the mean can be accu¬ 
rately estimated from the accepted samples. We intro¬ 
duce ftg to compensate for small expected likelihoods. 

The main issue that remains is how to optimally choose 
the parameters 9 and M. One approach is to locally opti¬ 
mize the Bayes risk [10], but the resulting calculation can 
be too expensive to carry out in online experiments that 
provide experimental results at a rate of tens of mega¬ 
hertz or faster. Fortunately, the particle guess heuristic 
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FIG. 3. Median errors for phase estimation in decohering systems for experiments constrained to use M < T 2 (left) and 
M < T 2 /IO (right). We take m = 12 000 and use 1 000 random samples to generate the data above. The initial state is taken 
to be a randomly chosen eigenstate in all cases. 


(PGH) can give an expedient and near-optimal experi¬ 
ment for this class of likelihood functions [19], 


M = 


'1.25' 

<7 




(4) 


This shows that all three methods tradeoff experimen¬ 
tal and computational resources differently. The latter 
point is especially salient because ITPE is inefficient un¬ 
like RFPE. 


The factor of 1.25 comes from optimizing the cost of 
RFPE. Non-integer M are appropriate if U = e~ lHM . 

Figure 1 shows the error incurred using RFPE. The 
most obvious feature is that the error shrinks exponen¬ 
tially with the number of experiments (which is propor¬ 
tional to the evolution time under the PGH) for m > 100. 
Roughly 150 experiments are needed for the method to 
provide 32 bits of accuracy in the median case. We dis¬ 
cuss the scaling in the mean in Appendix A. 

The number of experiments needed to reach error e 
scales as 0(log(l/e)) rather than 0( log(l/e) loglog(l/e)) 
in Kitaev’s method [6, 7]. Concretely, after 150 experi¬ 
ments the median error for Kitaev’s PE algorithm (with 
s = 10) is roughly 10 million times that of RFPE. 

Although the number of experiments needed is small, 
the experimental time need not be. If the error shrinks 
as e € Q(e~ XN ) where N is the experiment number then 
T exp G 0(£^=7e^) e 0(e AAr ““) G 0( 1/e). The time 
required saturates the Heisenberg limit, up to a multi¬ 
plicative constant. Specifically, A ss 0.17 for RFPE. 

Figure 2 compares RFPE to the information-theory 
PE (ITPE) method of Svore et al. Although ITPE is 
inefficient and is non-adaptive, it is exact and so it is a 
natural benchmark to compare RFPE against. We find 
ITPE requires nearly five times the applications of U if 
<f> = 2irk/t for integer k < t and t = 10 000. 

Conversely, ITPE requires only 25 measurements to 
identify the phase with 50% probability whereas RFPE 
requires 51 experiments if k is an integer. If the true 
value of k is real-valued and ITPE is left unmodified, 
then ITPE fails to learn in the median because the long 
evolution times chosen lead to contradictory possibili¬ 
ties which causes ITPE to become confused. We correct 
this by choosing M —>• |”M/2] in such cases, which in¬ 
creases the number of experiments to 35 but also reduces 
the experimental time below that of unmodified ITPE. 


A. Phase estimation with depolarizing noise 


A criticism that has been levied lately at PE meth¬ 
ods is that they can be impractical to execute on non¬ 
fault tolerant quantum hardware [20-22]. This is be¬ 
cause phase estimation attains its quadratic advantage 
over statistical sampling by using exponentially long evo¬ 
lutions. Decoherence causes the resultant phases to be¬ 
come randomized as time progresses, ultimately resulting 
in liniM->.oo -P(0|<^; 6, M) = 4. In this limit, the mea¬ 
surements convey no information. It similarly may be 
tempting to think that decoherence fundamentally places 
a lower limit on the accuracy of PE. Bayesian infer¬ 
ence to can, however, estimate 4> using experiments with 
Af»T 2 [10]. 

We model the effect of decoherence on the system by 
assuming the existence of a decoherence time T 2 such that 

P(O|0)=e- M / T ^ 1 ^° S 

Pm = + 1 - C 2 M/T2 , 

(5) 

This model is appropriate when the time required to im¬ 
plement the controlled operation A(C7) is long relative the 
that required to perform H and an arbitrary .Z—rotation, 
as is appropriate in quantum simulation. For simplicity, 
in the following we assume that X 2 is known. 

Since T 2 places a limitation on our ability to learn, we 
propose a variation to (4), 

M = min 

The experiments yielded by (6) are qualitatively similar 
to locally optimized experiments for frequency estima- 
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FIG. 4. Instantaneous estimate of the eigenphase for a system with 16 eigenvalues, A = 0, t = 0.1 and T 2 = 10 4 . 


tion which are chosen to saturate the Cramer-Rao bound 
[23]; however, (6) requires nearly 100 fold less comput¬ 
ing time to select an experiment. We choose M to cut 
off at T 2 in part because the locally optimized solutions 
choose to saturate at this value which can be under¬ 
stood using the following argument. The Cramer-Rao 
bound for frequency estimation scales as 0{M ~ 2 ) [24] 
in the absence of decoherence. Eqn. (5) suggests that 
decoherence causes the posterior variance to increase as 
0(exp(2M/T 2 )). As in the frequency estimation case 
[23], calculus reveals that M = T 2 optimally trades off 
these tendencies, which justifies (6). 

Figure 3 shows that RFPE smoothly transitions be¬ 
tween the exponential scaling expected at short times 
and the polynomial scaling expected when decoherence 
becomes significant. The error scales roughly as 1/N 0 6 
in this polynomial regime, which is comparable to the 
1/y/N scaling expected from statistical sampling. Deco¬ 
herence therefore does not necessarily impose a funda¬ 
mental limitation on the accuracy that PE can achieve. 
Figure 3 also shows that confining the experiments to 
stay in a more coherent regime also can actually hurt the 
algorithm’s ability to infer the phase, as expected. 


IV. TRACKING EIGENPHASES 

Figure 3 shows the performance of RFPE when the 
initial quantum state is an eigenstate and is discarded 
after each experiment. Performing phase estimation in 
this way minimizes the number of experiments, but can 
be prohibitively expensive if preparation of the initial 
eigenstate is prohibitively costly. In such cases, it makes 
sense to follow the standard prescription for phase es¬ 
timation by keeping the quantum state until it is clear 
that the initial eigenstate has been depolarized. These 
depolarizations can cause RFPE to become confused be¬ 
cause the new data that comes in is only consistent with 
hypotheses that have been ruled out. We address this 
by performing inexpensive experiments to assess whether 
the state has depolarized and then restart the learning 
process. Restarting can even be valuable when T 2 = 0 
if the input state is a superposition of eigenvectors or to 


recover if the RFPE becomes stuck (see Appendix A). 

The following procedure addresses this issue in cases 
where the spectral gaps are promised to be at least A. 

1. After each update with probability e _M / T2 perform 
an experiment with 9 = /i and M = r /a for r < 1. 

2. If result = 1 then prepare initial state and reset <7. 

3. After restart, continue as normal until a < A then 
set er and /i to those of the closest eigenvalue. 


Steps 1 and 2 perform a one-sided test of whether the 
prior distribution is consistent with the current state. 
If the prior probability distribution is correct then the 
probability of measuring 0 is 




/ (g-a;)r\ (^-*> 2 1 + e 1-2 / 2 

V 2a J 2 


( 7 ) 

If r = 0.1, the probability of measuring 0 is approx¬ 
imately 0.998 and hence measuring 1 implies that the 
hypothesis that the prior is correct can be rejected at 
p < 0.002. A Bayesian analysis of our reset rule is given 
in Appendix D. 


Step 3 restarts the learning process if the learning pro¬ 
cess fails. It is important, however, to not throw away 
the spectral information that has been learned prior to 
the restart. Step 3 reflects this by checking to see if 
the current estimate of the eigenphase corresponds to a 
known eigenstate and then sets p to be the estimate of 
the eigenvalue and a to be its uncertainty if such an iden¬ 
tification is made. This allows the algorithm to resume 
learning once the depolarized state is projected onto a 
known eigenstate. 

The test process also permits the eigenvalue of an 
eigenstate in a decohering system to be estimated in real 
time. Figure 4 shows that the restarting algorithm can 
rapidly detect a transition away from the instantaneous 
eigenstate and then begin inferring the eigenvalue of the 
system’s new instantaneous eigenstate. 
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V. CONCLUSION 


Our work makes phase estimation more experimentally 
relevant by substantially reducing the experimental time 
required and also by making the process resilient to deco¬ 
herence. This is especially critical as current experiments 
push past the classical regime and IPE becomes increas¬ 
ingly impractical in lieu of fault-tolerance. In particular, 
the ability of our algorithm to learn in the presence of 
decoherence provides an efficient alternative to the vari¬ 
ational eigensolvers used in present day experiments [20- 
22 ], 


Looking at the problem of phase estimation more gen¬ 
erally, it is clear that it is in essence an inference problem. 
Our work shows that modern ideas from machine learn¬ 
ing can be used therein to great effect. It is our firm belief 
that by combining the tools of data science with clever 
experimental design, further improvements can be seen 
not only in phase estimation but also quantum metrology 
in general. 
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FIG. 5. Cumulative distribution function of probability that PE error is less than x after 150 updates for m = 50 (left) 
m = 100 (middle) m = 200 (right). 


Appendix A: Variance Reduction Strategies 

An important drawback of RFPE is the tails for the distribution of errors in phase estimation can be quite fat in 
typical applications, meaning that there is significant probability that the error in the inferred eigenphase is orders 
of magnitude greater than the median. Such a distribution can be seen in Figure 5 where we see that although the 
median error is roughly 10 -1 ° radians after 100 updates for m > 50 a non- negligible fraction of the experiments have 
error on the order of 1 . 

Fortunately, the need to always repeat the algorithm and use a majority voting scheme to reduce the variance of 
the estimate is mitigated by the fact that the algorithm outputs a which estimates the uncertainty in the resultant 
eigenphase. Alternatively, less aggressive experiments can be chosen in the guess heuristic or multi-modal models for 
the prior distribution (such as Gaussian mixture models) can be used. The later approach can be quite valuable in 
computer vision and machine learning, however here we have an additional freedom not typically enjoyed in those 
fields: we can perform adaptive experiments to test to see if our current hypothesis is correct. 

The central idea behind our restarting strategy is to examine the decay of a with the number of experiments. In 
ideal cases, the error decays exponentially which means that it is easy to see when the algorithm fails by plotting a on 
a semi- log plot. Such intuition can be easily automated. The idea behind our restarting algorithm is to first estimate 
the derivative of log(cr) with respect to the experiment number and determine whether it is less than a threshold. If 
it is less than the threshold, perform an experiment to test to see if the value of /r that has been learned so far is 
accurate (as per our incoherent phase estimation algorithm). If it is found to be innacurate then restart the algorithm 
and abandon the information learned so far. The restarting portion of this algorithm is given in Algorithm 3. 

When a restarting strategy like this is employed, we need to modify the model selection criteria. Previously, we 
used the value of /i yielded by the most recent experiment as our estimate. Given that a restart late in the algorithm 
can reset all information learned about a model, it makes sense in this context to use the value of /r corresponding to 
the smallest a observed. This corresponds to the model that the inference algorithm has the greatest certainty. 

We see in Figure 6 that this strategy of restarting substantially reduces the weights of the tails. In fact, the 
mean error in the inference falls from 0.0513 radians to 1.08 x 10~ b radians. This shows that this resetting strategy 
substantially reduce the probability of a large error occuring in the estimate of the eigenphase. To our knowledge, 
this data represents the first time that a heuristic method has been successful in reducing the mean error in frequency 
estimation to an acceptable level of uncertainty [10]. Previous approaches that succeeded in making the mean-error 
small used costly numerically optimized experiments, which render such approaches impractical for online phase 
estimation [10, 19, 23-25]. 

Note that Kitaev’s phase estimation algorithm also can have a high probability of failure throughout the algorithm, 
so one might wonder if these restarting strategies might also be of value for traditional phase estimation algoirthms. 
Unfortunately, these strategies cannot be easily translated to Kitaev’s PE algorithm because the most expensive 
experiments are performed first. Thus, restarting does not allow the algorithm to combat decoherence, nor can it be 
used to inexpensively fix situations where one of the bits is inferred improperly. 


Appendix B: Stability Against Errors in the Likelihood Function 

Our algorithm is not only robust against well known depolarizing noise but is robust against uncharacterized 
noise sources as well. We demonstrate this in Figure 7 wherein we introduce depolarizing noise of strength 7 to 
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logio( error ) logio(error) 


FIG. 6 . Cumulative distribution function of probability that PE error is less than x after 200 updates for m = 2000 for F = oo 
(left) T = 0.1 (right) and r = 0.1. Estimate of CDF consists of 1000 random eigenphase inference problems with T 2 = 00 . 



FIG. 7. Errors in inference for phase estimation with different levels of un-modeled noise, 7 , for T 2 = 1000. In each instance 
the initial state was taken to be a fixed, but unknown, eigenstate of the Hamiltonian. The median was found for 100 randomly 
chosen values of the eigenphase of this fixed initial eigenstate. 


our experimental system, but do not include such noise in the likelihood function. For example, with 7 = 0.4, the 
measurement outcome of phase estimation is replaced with a random bit with probability 40%. Although this may 
seem qualitatively similar to the numerical examples for depolarizing noise considered in the main body of the paper, 
this case is much more pathological because it is only used to generate the experimental data and is not permitted in 
the model of the system used in the likelihood function. This raises the possibility that the algorithm could become 
confused due to experimental results that are fundamentally inconsistent with the assumed likelihood function for the 
system. 

Figure 7 shows that the inclusion of uncharacterized depolarizing noise does not actually prevent the eigenvalue 
from being estimated. Rather it reduces the number of bits per experiment that the algorithm can infer. We see this 
by fitting the exponential part of the data in Figure 7 (along with similar data for 7 = 0.1, 7 = 0.3, 7 = 0.5) and 
find that the error decay exponent, A, to shrink as roughly A ss 0.17e -317 it does not prevent our algorithm from 
learning at an exponential rate (until depolarizing noise becomes significant). Thus RFPE continue to work even in 
the presence of depolarizing noise sources that are both strong and uncharacterized. 


Appendix C: Stability of Rejection Filtering PE 

One way in which the rejection sampling algorithm can break down is if the likelihood function becomes too flat 
relative to the number of discrete samples, m, drawn from the prior distribution. This breakdown occurs because 
the sample variance in the posterior mean is much greater than the difference between the prior mean and posterior 
means, which are not expected to be approximately the same if the likelihood function has little variation. This in 
effect implies that the dynamics of the mean will essentially be a random walk and as a result we do not expect that 
our rejection sampling method to perform well if the likelihood function is too flat. 

In practice, the flatness of the distribution can be reduced by batching several experiments together and processing 
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them. This process is in general inefficient, but can be made efficient if an appropriate value of ke is known for each 
datum recoreded. 

The question remaining is: how flat is too flat? We show in the theorem below that if the number of samples taken 
from the true posterior distribution does not scale at least inverse quadratically with the scale of relative fluctuations 
in the likelihood function then we expect the posterior variance to be much greater than the shift in the means. 

Theorem 1. Assume that for all j P(E\ X j) = a + 6j where |5j| < S and a > 105 and assume that Xj ~ P{x 3 ). If we 
then define Ho := Ej P( x j) x j> Mi := Ej P{ x j\E)x 3 an d °i t° be the posterior variance then |mi — Mol G 0(<Ji/y / rn) 
only if e 0(m). 

Proof. Bayes’ rule gives us 


E p ( Xj\E)xj - no 

i 


Ej P(E\xj)P(xj){xj - ho) 


Ej d 3 P{x 3 ){x 3 - ho) 

Ej P{E\x 3 )P{x 3 ) 


Ej P(E\x 3 )P(x 3 ) 


Then using the Cauchy-Schwarz innequality, the triangle inequality and a > 26. 


E j djP{xj){x 3 - Ho) 

< 

S y/T,j p (xj)\xj ~ Mol 2 

Ej P{E\xj)P{ Xj ) 


a — 6 


6a 26a 
— 6 ~ a 


(Cl) 


(C2) 


Thus the maximum shift in the posterior mean shrinks as the the likelihood function becomes increasingly flat, as 
expected. 

Next we need to lower bound the posterior variance in order ensure that the value of m chosen suffices to make the 
error small in the best possible case. To do so we use the reverse triangle inequality: 

a\ = \a\ — a 2 + a 2 \ > a 2 — \af — a 2 \. (C3) 

Thus it suffices to upper bound |crf — a 2 | to lower bound a\. To do so, note that a > 26 and hence 


\ p ( x j\E) - P( x j)\ 


P( x j){6j - Ej p ( x j)$j) 

« + Ej p ( x j) s j 


P(x 3 )26 < P(x 3 )46 
a — 6 ~ a 


(C4) 


Now the difference between the two variances can be written as 


2 2 I _ 

a, — a — 


< 


J2 P (xj\ E ){xj - Hi) 2 - P(xj){xj - Hof 


E^l^) “ P ( x j))( x j ~ Mi ) 2 


E P ( x i) ((^ “ Mo) 2 - ( x j - Mi) 2 ) 


< 


66 


E p (^)(^ _ mi ) 5 


+ (Mi — Mo) 2 


46a 2 46 

< - + — 

a a 


E P( x M x i ~ Ml) 2 - (Xj - Ho) 2 } 


+ (Mi — Mo Y 


46a 2 45 2 45cr 2 12 6 2 a 2 105er 2 

< -+ (1 + — )(Mi - Mo) 2 < -+- 2 — < - 

a a a or a 


(C5) 


Thus we have that 


a\ > cr 2 (l — 105/a). 


(C6) 


Now assuming 5 < a /10 we have 


a 2 e H{a 2 ). 


(C7) 
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Finally, we note that 


|Mi - Mol G £l{<Ji/y/rn) =>■ — G £l{a/y/m), (C8) 

a 

which is only true if m G S2(a 2 /<!> 2 ). D 

We therefore see that the number of samples needed to track the small changes in a posterior distribution that 
happens when the likelihood function becomes extremely flat. This condition is not sufficient because the actual 
components of the posterior mean may be shifted by a much smaller amount than the upper bounds used in the proof 
of Theorem 1 . 

In contrast, exact Bayesian inference requires a number of bits that scales as 0(log(l/<5)) (assuming a fixed and 
discrete number of hypotheses). Thus exact Bayesian inference (or to a lesser extent particle filter methods) can 
be preferable in cases where the likelihood function is extremely flat. Such concerns can be somewhat be either be 
avoided entirely or batches of such experiments should be combined to produce a likelihood function that is much less 
flat and choosing an appropriate instrumental distribution to ensure that the success probability remains high. 


Appendix D: Bayes Factors for Reset Rule 


Though in the main body, we have chosen to present the reset step in terms of p -values for familiarity, we note that 
p -values are difficult to correctly interpret and can lead to misunderstandings [26, 27]. As an alternative, one may 
prefer to use the Bayes factor to test whether the rejection filter has failed. For example, the rejection filter could 
fail due to a failure of the numerical approximation or because the eigenstate has been lost, as described in the main 
body. For a uniform prior over whether the rejection filter has failed, this reduces to the likelihood ratio test 


Pr(result = l|prior wrong) 

Pr(result = l|prior correct) 

1 - e~ (T " <T ^/2g 2 +tTT/r 2 ) cos ( T ^ _ preset) /a) 

1 - e- r2 /2 


(Dla) 

(Dlb) 


where preset and u rese t are the values of fi and a immediately following a reset. Using this test, the degree by which 
L > 1 informs as to the degree to which we should prefer the model proposed by the reset rule. Again under the 
assumption of a uniform prior over the validity of the reset rule, 


Pr(prior wrong|result = 1) = LPr(prior right|result = 1). (D2) 

For instance, if the variance has been reduced by a factor of 100 from its initial value (cr = cr reS et/100) and the current 
mean is correct (p = preset), then assuming T 2 = 100 and an initial standard deviation of <r rese t = ir/VS, L « 8000 
for a result of 1. That is, the initial prior is 8,000 times as probably correct as the current prior in this example. 

Importantly, this example rests on the assumption that one take a uniform prior over whether the current prior 
is valid. This assumption corresponds to that which an impartial observer might take in evaluating whether the 
numerical approximations made by our rejection filter algorithm have failed for the observed data. That is, the reset 
rule proposed corresponds performing an intervention without relying on the full experimental data record. Choosing 
a threshold other than L = 1 represents placing a different prior, as could be informed by observing the failure 
modalities of many runs of the rejection filter method. As demonstrated in Figure 8, because our reset rule resets 
with probability 1 if a 1 is observed, choosing r effectively sets the threshold for L. In practice, however, because 
Pr(result = ljprior correct) ss 0, the reset rule is only weakly dependent on the specific threshold one places on L. 


Appendix E: Pseudocode for Algorithms 

In the main body we sketched the details of our phase estimation algorithm. Here we elaborate on this algorithm 
and discuss some of the subtle details needed to make the algorithm work. The first such subtlety stems from the 
fact that eigenphases are equivalent modulo 27t. To see this problem, consider a Gaussian distribution centered at 
0. If we take the outputs of the distribution in the branch [0,27r] then we find that the mean of the distribution is 
7T rather than 0. Since the support of the initial Gaussian may be small at </> = 7 r, such errors can be catastrophic 
during the inference procedure. This can be dealt by using the circular mean and by working with a wrapped normal 
distribution. This approach is discussed in Algorithm 1. For expedience, we eschew this approach and instead use a 
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O / OVeset 10 1 

O / (Jpgset 10 2 

O / (7 r eset 10 

O / Preset 10 


FIG. 8. Likelihood ratio test values for various settings of the parameter r, and for various prior variances u/(J Tese t. In this 
example, we suppose that p = preset and that T 2 = 100. 


heuristic approach that does not require a large number of trigonometric calls, which can be expensive if the device 
performing the inference does not have native trigonometric calls. 

The heuristic approach, described in Algorithm 2, uses rejection sampling and incremental methods to estimate 
the mean and standard deviation of the posterior distribution. If a <C 27t then the probability distribution is narrow 
and does not suffer from significant wrap around unless n mod 2ir ss 0. We address this by keeping track of each 
of the accepted (f>j as well as <f>j + n. If a -C 1 than it is impossible that both distributions suffer from substantial 
wrap around. The arithmetic, rather than circular, mean and standard deviation are then computed for both using 
an incremental formula and the branch with the smallest variance is kept. If the branch that was shifted by it is 
chosen, then 7 r is subtracted from the reported mean. The standard deviation does not need to be modified because 
it is invariant with respect to displacements of the mean. 

While this approach is correct if a -C 1, it is only approximate if <7 is on the order of 1. In such cases, computation 
of the circular mean is much better justified, however we find that using our heuristic approach continues to provide 
acceptable results while avoiding trigonometric calls that can be expensive in some contexts. An alternative approach 
to solving this problem is to begin each phase estimation run with a batch of random experiments, as per [9], before 
continuing to ensure that the posterior variance is small enough to neglect the wrap around effect. 

The choice of the evolution time and the inversion angle strongly impacts the efficiency of the learning algorithm. 
We provide below code for a modified version of the particle guess heuristic of [19]. As discussed in the main body, we 
expect that choosing M > T 2 will typically lead to worse estimates of the eigenphase because the effects of decoherence 
overwhelm the information that can be gleaned from these long experiments. As a result, we modify the particle guess 
heuristic to never choose M > T 2 . We formally state this procedure in Algorithm 4. 
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Algorithm 1 Bayes Update for RcjF using Directional Statistics 
Input: Prior mean and variance /x, a, measurement E, settings M, 9, number of attempts m, scale ke 
\> Initialize accumulators to 0. 

(*^inc5 Mine? N a) ^ 0* 

l> Attempt each sample, 

for i € 1 -> m do 

-(<j>-fx) 2 /2a 2 

X ~ oU 2 F ’ 

> Accept or reject the new sample. 

u ~ Uniform(0,1) 
if P(E \x) > keu then 

> Accumulate using the accepted sample using Cartesian coordinates. 

*£inc ^ ^inc H - COS^x) 

Vine t- j/inc + sin(a:) 

Na <— N a + 1. 

end if 
end for 

Unc t X{ nc /N a 

Vine t Vine /N a 

> Return mean, variance of the posterior using accumulators. 
n 4r- Arg{x inc Wine)’ 

> Use circular standard deviation to estimate SD for wrapped Gaussian 

return (/x, a) 


Algorithm 2 Bayes Update for RejF 

Input: Prior mean and variance /x, a, measurement E, settings M, 9, number of attempts m, scale ke 

> Initialize accumulators to 0. 

(Mine? Mine’ ^nc> ^inc’ ^a) ^ 0 

> Attempt each sample. 

for i £ 1 -> m do 


> Draw a sample using each “cut” of the prior. 

— (4>-a0 2 /2o- 2 
x 1 

x x mod 2-7T. 
x' <— x + 7r mod 27 t. 

> Accept or reject the new sample. 

u ~ Uniform(0,1) 
if P{E\x) > keu then 


> Accumulate using the accepted sample w/ each “cut.” 


*<— 


Mine 
- Unc 


+ X 
+ X 2 

+ x' 2 


N a <— N a + 1. 


end if 
end for 


> Return mean, variance of the posterior using accumulators. 
fl' <— fJ>inc/N a 

min Wnc-f4c)- \Jwt=~l (h'nc ~ m£c)) 

return (/x', o') 
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Algorithm 3 Restarting algorithm 

Input: Prior RejF state, records of all previous models found in the phase estimation algorithm /./, a, initial standard deviation 
Omit, M, T 2 , a counter CNT, T and t. 

Output: CNT, a 

function Restart(/x, <r,<Tinit, M, T 2 , CNT, T, r) 

D <— derivative of log a. 

if D > T and CNT < 5 or rand()> exp(— M/T 2 ) then > Checks to see if the eigenstate is suspect. 

Perform experiment with M = t/(t and 9 = /x. 

if Outcome is 0 then t> Test concludes state estimate is valid 

CNT e- CNT + 1. 
return CNT, a. 

else > Test concludes state estimate is invalid 

CNT 0 

<J init 

return CNT, a 
end if 

else 0 Does not restart if state is not suspect 

CNT 4 - CNT + 1 
return CNT, a. 
end if 

end function 


Algorithm 4 PGH for decoherent phase estimation using RejF 
Input: Prior RejF state ( 1 , E. Resampling kernel F. 

Output: An experiment (M,9). 
function PGH RejF (/x, S, T 2 ) 

M <- 1.25/VTr(E) 
if M > T 2 then 

M ~ f(x; I/T 2 ) > Draw M from an exponential distribution with mean T 2 . 

end if 

(-0/M)~ F( M ,S) 
return (M,6). 
end function 





